Rank | Count | Beginning |
---|---|---|
154911 | 24581 | I |
57009 | 16625 | Det |
125366 | 13382 | Han |
38369 | 10481 | Den |
32608 | 8397 | De |
71803 | 6077 | Dette |
96850 | 5091 | Etter |
88380 | 4806 | EN |
234254 | 4402 | På |
149163 | 3543 | Hun |
28613 | 2743 | Da |
209752 | 2721 | Men |
45830 | 2593 | Denne |
109604 | 2228 | For |
94775 | 2140 | Et |
292356 | 1977 | Ved |
115914 | 1876 | Fra |
286060 | 1845 | Under |
82154 | 1826 | Disse |
135599 | 1694 | Hans |
266691 | 1536 | Som |
222848 | 1517 | Noen |
256635 | 1391 | Selv |
207572 | 1348 | Med |
279136 | 1309 | Til |
218555 | 1266 | Når |
143572 | 1207 | Her |
204021 | 1207 | Mange |
203682 | 1076 | Man |
260823 | 1063 | Siden |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV